## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Genetic and epigenetic fine mapping of causal autoimmune disease variants

Supplemental table 1 has genomic coordinates of disease-associated SNPs.

Original overlap analysis

## Using V3 as value column: use value.var to override.

We visualize clustering of disease-specific SNP sets based on the number of overlapping SNPs.

Analysis of TFBSs

Out of all regulatory datasets, we select only TFBSs.

## [1] 1954   39
## [1] 1259   39

We check how regulatory similarity correlates with overlap similarity.

##      x    y
## x 1.00 0.28
## y 0.28 1.00
## 
## n= 1482 
## 
## 
## P
##   x  y 
## x     0
## y  0

Next, we visualize heatmap of regulatory similarity.

Text mining question 1: Are the diseases within a cluster share stronger literature similarity than the diseases between the clusters? To answer, we need literature similarity scores for each pair, then split the pairs into cluster-specific groups and compare score distributions with what can be expected by chance, calculating the p-values for it. Expected answer: Diseases within each cluster are related to each other by literature findings stronger than could be expected by chance. Diseases between the clusters are not related to each other by literature findings, and this also may be statistically significant.

The top 10 pairs of disease-associated SNPs are most similar with each other.

## 
## -----------------------------------------------------------------------------------------------
##                   Disease 1                             Disease 2            Corr. coefficient 
## ---------------------------------------------- ---------------------------- -------------------
##                HDL_cholesterol                        Triglycerides               0.5484       
## 
##                Kawasaki_disease                Systemic_lupus_erythematosus       0.5352       
## 
##              Bone_mineral_density                    Type_2_diabetes              0.5268       
## 
##                Kawasaki_disease                     Multiple_sclerosis             0.501       
## 
##                Kawasaki_disease                    Rheumatoid_arthritis           0.4775       
## 
##                 Celiac_disease                       Kawasaki_disease             0.4754       
## 
##                LDL_cholesterol                        Triglycerides               0.4743       
## 
##                Kawasaki_disease                     Ulcerative_colitis            0.4661       
## 
## Liver_enzyme_levels_gamma_glutamyl_transferase         Urate_levels               0.4191       
## 
##              Alzheimers_combined                   Bone_mineral_density           0.4149       
## -----------------------------------------------------------------------------------------------

The similarity dendrogram can be divided into separate groups:

## Cluster01 has   8 members 
## Kawasaki_disease
## Systemic_lupus_erythematosus
## Celiac_disease
## Ulcerative_colitis
## Psoriasis
## Multiple_sclerosis
## Rheumatoid_arthritis
## Allergy
##  
## Cluster02 has   9 members 
## Systemic_sclerosis
## Primary_biliary_cirrhosis
## Atopic_dermatitis
## Juvenile_idiopathic_arthritis
## Ankylosing_spondylitis
## Crohns_disease
## Type_1_diabetes
## Autoimmune_thyroiditis
## Primary_sclerosing_cholangitis
##  
## Cluster03 has  10 members 
## Urate_levels
## Liver_enzyme_levels_gamma_glutamyl_transferase
## LDL_cholesterol
## HDL_cholesterol
## Triglycerides
## Renal_function_related_traits_BUN
## Platelet_counts
## Red_blood_cell_traits
## C_reactive_protein
## Fasting_glucose_related_traits
##  
## Cluster04 has  12 members 
## Chronic_kidney_disease
## Alzheimers_combined
## Bone_mineral_density
## Type_2_diabetes
## Vitiligo
## Migraine
## Alopecia_areata
## Asthma
## Creatinine_levels
## Behcets_disease
## Progressive_supranuclear_palsy
## Restless_legs_syndrome
## 

The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations between the groups is statistically significantly different.

## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 54"
## 
## ---------------------------------------------------------------------------------------------------------
##                     Row.names                         c1       c2    adj.P.Val             V2            
## -------------------------------------------------- --------- ------ ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 0.8227  0.0001666   GM12878 RUNX3 v042211.1 
##                                                                                 ChIP-seq Peaks Rep 2 from
##                                                                                        ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk       0.008529  0.9186  0.0002231  GM18951 NFKB IgG-rab TNFa
##                                                                                    ChIP-seq Peaks from   
##                                                                                        ENCODE/SYDH       
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1   0.001167  0.6894  0.0002231   GM12878 FOXM1 v042211.1 
##                                                                                 ChIP-seq Peaks Rep 1 from
##                                                                                        ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk       0.008903  0.9673  0.0002231  GM19099 NFKB IgG-rab TNFa
##                                                                                    ChIP-seq Peaks from   
##                                                                                        ENCODE/SYDH       
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2   0.0001479 0.6107  0.0002929    GM12878 PML v042211.1  
##                                                                                 ChIP-seq Peaks Rep 2 from
##                                                                                        ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1  0.0008332 0.8376  0.0002929   GM12878 ATF2 v042211.1  
##                                                                                 ChIP-seq Peaks Rep 1 from
##                                                                                        ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.003957  0.942   0.0003906  GM12878 STAT5A v042211.1 
##                                                                                 ChIP-seq Peaks Rep 1 from
##                                                                                        ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1  6.767e-05 0.4192  0.0004104   GM12878 NFIC v042211.1  
##                                                                                 ChIP-seq Peaks Rep 1 from
##                                                                                        ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599  0.6266  0.0004104  GM12878 STAT5A v042211.1 
##                                                                                 ChIP-seq Peaks Rep 2 from
##                                                                                        ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2  0.003147  0.7082  0.0004104   GM12878 ATF2 v042211.1  
##                                                                                 ChIP-seq Peaks Rep 2 from
##                                                                                        ENCODE/HAIB       
## ---------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 56"
## 
## ----------------------------------------------------------------------------------------------------------
##                     Row.names                         c1       c3     adj.P.Val             V2            
## -------------------------------------------------- --------- ------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 -0.3199  1.525e-06   GM12878 RUNX3 v042211.1 
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1   0.001167  -0.2378  1.978e-06   GM12878 FOXM1 v042211.1 
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1  6.767e-05 -0.1347  3.878e-06   GM12878 NFIC v042211.1  
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk       0.008529  -0.6289  5.781e-06  GM18951 NFKB IgG-rab TNFa
##                                                                                     ChIP-seq Peaks from   
##                                                                                         ENCODE/SYDH       
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2   0.0001479 -0.3198  8.173e-06    GM12878 PML v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2  0.003147  -0.3812  8.605e-06   GM12878 ATF2 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1  0.0008332 -0.3287  8.605e-06   GM12878 ATF2 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk       0.008903  -0.5609  9.529e-06  GM19099 NFKB IgG-rab TNFa
##                                                                                     ChIP-seq Peaks from   
##                                                                                         ENCODE/SYDH       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.003957  -0.3773  9.887e-06  GM12878 STAT5A v042211.1 
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599  -0.4904  1.692e-05  GM12878 STAT5A v042211.1 
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## ----------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c4 , number of degs significant at adj.p.val<0.5: 55"
## 
## -----------------------------------------------------------------------------------------------------------
##                     Row.names                         c1        c4     adj.P.Val             V2            
## -------------------------------------------------- --------- -------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 -0.4205   1.012e-06   GM12878 RUNX3 v042211.1 
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2   0.0001479  -0.243   3.42e-06     GM12878 PML v042211.1  
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk       0.008529  -0.6439   3.42e-06   GM18951 NFKB IgG-rab TNFa
##                                                                                      ChIP-seq Peaks from   
##                                                                                          ENCODE/SYDH       
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1   0.001167  -0.5047   3.42e-06    GM12878 FOXM1 v042211.1 
##                                                                                   ChIP-seq Peaks Rep 1 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1  6.767e-05 -0.2602   3.42e-06    GM12878 NFIC v042211.1  
##                                                                                   ChIP-seq Peaks Rep 1 from
##                                                                                          ENCODE/HAIB       
## 
##      wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk       0.008903  -0.5014   3.42e-06   GM19099 NFKB IgG-rab TNFa
##                                                                                      ChIP-seq Peaks from   
##                                                                                          ENCODE/SYDH       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599  -0.3133   3.42e-06   GM12878 STAT5A v042211.1 
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2  0.003147  -0.4064   3.42e-06    GM12878 ATF2 v042211.1  
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1  0.0008332 -0.4112   4.748e-06   GM12878 ATF2 v042211.1  
##                                                                                   ChIP-seq Peaks Rep 1 from
##                                                                                          ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2  2.011e-05 -0.09226  5.47e-06    GM12878 MTA3 v042211.1  
##                                                                                   ChIP-seq Peaks Rep 2 from
##                                                                                          ENCODE/HAIB       
## -----------------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
## 
## ----------------------------
##  &nbsp;   c1   c2   c3   c4 
## -------- ---- ---- ---- ----
##  **c1**   0    54   56   55 
## 
##  **c2**   0    0    0    0  
## 
##  **c3**   0    0    0    0  
## 
##  **c4**   0    0    0    0  
## ----------------------------

Text mining question 2: Are the terms associated stronger with the diseases in one vs. the other cluster based on the literature strength? Are the terms themselves related based on the literature? Expected answer: Yes, the literature associations should confirm the relationships.

Summary

  1. There are 4 clusters. The first cluster drives all the differences.
C1 C2 C3 C4
C1 Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1
C2 Nothing significant Nothing significant
C3 Nothing significant
C4

Analysis of histone marks

Out of all regulatory datasets, we select only histone marks

## [1] 721  39
## [1] 610  39

We check how regulatory similarity correlates with overlap similarity.

##      x    y
## x 1.00 0.23
## y 0.23 1.00
## 
## n= 1482 
## 
## 
## P
##   x  y 
## x     0
## y  0

Next, we visualize heatmap of regulatory similarity.

Text mining question 1: Are the diseases within a cluster share stronger literature similarity than the diseases between the clusters? To answer, we need literature similarity scores for each pair, then split the pairs into cluster-specific groups and compare score distributions with what can be expected by chance, calculating the p-values for it. Expected answer: Diseases within each cluster are related to each other by literature findings stronger than could be expected by chance. Diseases between the clusters are not related to each other by literature findings, and this also may be statistically significant.

The top 10 pairs of autoimmune-associated SNPs are most similar with each other.

## 
## ---------------------------------------------------------------------------------------
##             Disease 1                         Disease 2              Corr. coefficient 
## --------------------------------- --------------------------------- -------------------
##          HDL_cholesterol                    Triglycerides                  0.621       
## 
##       Rheumatoid_arthritis               Ulcerative_colitis               0.4856       
## 
##          HDL_cholesterol                   LDL_cholesterol                 0.48        
## 
##          HDL_cholesterol                   Platelet_counts                0.4609       
## 
##          Platelet_counts                    Triglycerides                 0.4504       
## 
##          LDL_cholesterol                    Triglycerides                 0.4151       
## 
##         Creatinine_levels         Renal_function_related_traits_BUN       0.3915       
## 
##             Psoriasis               Systemic_lupus_erythematosus          0.3911       
## 
## Renal_function_related_traits_BUN           Urate_levels                  0.3689       
## 
##          Alopecia_areata                 C_reactive_protein               0.3686       
## ---------------------------------------------------------------------------------------

The similarity dendrogram can be divided into separate groups:

## Cluster01 has   6 members 
## Celiac_disease
## Multiple_sclerosis
## Kawasaki_disease
## Primary_biliary_cirrhosis
## Systemic_lupus_erythematosus
## Psoriasis
##  
## Cluster02 has  14 members 
## Type_2_diabetes
## Fasting_glucose_related_traits
## Red_blood_cell_traits
## Crohns_disease
## Migraine
## Systemic_sclerosis
## Ankylosing_spondylitis
## Platelet_counts
## Triglycerides
## HDL_cholesterol
## Vitiligo
## Progressive_supranuclear_palsy
## Liver_enzyme_levels_gamma_glutamyl_transferase
## LDL_cholesterol
##  
## Cluster03 has  11 members 
## Allergy
## Type_1_diabetes
## Primary_sclerosing_cholangitis
## Juvenile_idiopathic_arthritis
## Behcets_disease
## Ulcerative_colitis
## Rheumatoid_arthritis
## Autoimmune_thyroiditis
## Alopecia_areata
## C_reactive_protein
## Asthma
##  
## Cluster04 has   8 members 
## Bone_mineral_density
## Chronic_kidney_disease
## Alzheimers_combined
## Restless_legs_syndrome
## Atopic_dermatitis
## Urate_levels
## Renal_function_related_traits_BUN
## Creatinine_levels
## 

The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations bwtween the groups is statistically significantly different.

## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 44"
## 
## ------------------------------------------------------------------------------------------------------------
##                    Row.names                       c1        c2     adj.P.Val               V2              
## ----------------------------------------------- --------- -------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 -0.5773   5.236e-08   GM12875 H3K4me3 Histone Mod 
##                                                                                  ChIP-seq Hotspots 1 from   
##                                                                                          ENCODE/UW          
## 
##     wgEncodeBroadHistoneGm12878H3k9acStdPk      3.849e-12  -0.205   8.601e-07   GM12878 H3K9ac Histone Mods 
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## 
##     wgEncodeBroadHistoneGm12878H3k4me2StdPk     8.782e-09 -0.06392  8.601e-07  GM12878 H3K4me2 Histone Mods 
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 -0.3131   1.346e-06   GM12865 H3K4me3 Histone Mod 
##                                                                                  ChIP-seq Hotspots 2 from   
##                                                                                          ENCODE/UW          
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 -0.3862   7.309e-06   GM12865 H3K4me3 Histone Mod 
##                                                                                  ChIP-seq Hotspots 1 from   
##                                                                                          ENCODE/UW          
## 
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202  -0.7181   9.149e-06   GM12864 H3K4me3 Histone Mod 
##                                                                                  ChIP-seq Hotspots 2 from   
##                                                                                          ENCODE/UW          
## 
##       wgEncodeBroadHistoneDnd41H3k09acPk        0.0001527 -0.6869   1.924e-05  Dnd41 H3K9ac Histone Mods by 
##                                                                                     ChIP-seq Peaks from     
##                                                                                        ENCODE/Broad         
## 
##   wgEncodeBroadHistoneGm12878H3k04me3StdPkV2    4.708e-08  -0.167   2.222e-05  GM12878 H3K4me3 Histone Mods 
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## 
##    wgEncodeBroadHistoneGm12878H3k79me2StdPk     1.867e-08 -0.9988   3.557e-05  GM12878 H3K79me2 Histone Mods
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## 
##   wgEncodeBroadHistoneGm12878H3k04me1StdPkV2    6.263e-15 -0.01015  3.557e-05  GM12878 H3K4me1 Histone Mods 
##                                                                                   by ChIP-seq Peaks from    
##                                                                                        ENCODE/Broad         
## ------------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 54"
## 
## -----------------------------------------------------------------------------------------------------------
##                    Row.names                       c1       c3     adj.P.Val               V2              
## ----------------------------------------------- --------- ------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 0.8142   1.464e-06   GM12875 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 1 from   
##                                                                                         ENCODE/UW          
## 
##     wgEncodeBroadHistoneGm12878H3k9acStdPk      3.849e-12 0.2791   2.209e-05   GM12878 H3K9ac Histone Mods 
##                                                                                  by ChIP-seq Peaks from    
##                                                                                       ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 0.6608   3.836e-05   GM12865 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 2 from   
##                                                                                         ENCODE/UW          
## 
##     wgEncodeBroadHistoneGm12878H3k4me2StdPk     8.782e-09 0.5631   5.197e-05  GM12878 H3K4me2 Histone Mods 
##                                                                                  by ChIP-seq Peaks from    
##                                                                                       ENCODE/Broad         
## 
##    wgEncodeBroadHistoneGm12878H3k79me2StdPk     1.867e-08 -0.4995  7.28e-05   GM12878 H3K79me2 Histone Mods
##                                                                                  by ChIP-seq Peaks from    
##                                                                                       ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 0.7926   7.28e-05    GM12865 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 1 from   
##                                                                                         ENCODE/UW          
## 
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202  0.9054   7.28e-05    GM12864 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 2 from   
##                                                                                         ENCODE/UW          
## 
##       wgEncodeBroadHistoneDnd41H3k09acPk        0.0001527 -0.9118  7.28e-05   Dnd41 H3K9ac Histone Mods by 
##                                                                                    ChIP-seq Peaks from     
##                                                                                       ENCODE/Broad         
## 
##       wgEncodeBroadHistoneDnd41H3k04me1Pk       9.105e-08 -0.7878  0.0001171  Dnd41 H3K4me1 Histone Mods by
##                                                                                    ChIP-seq Peaks from     
##                                                                                       ENCODE/Broad         
## 
## wgEncodeUwHistoneGm06990H3k4me3StdHotspotsRep1  0.0001636 -0.9922  0.0005313   GM06990 H3K4me3 Histone Mod 
##                                                                                 ChIP-seq Hotspots 1 from   
##                                                                                         ENCODE/UW          
## -----------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c4 , number of degs significant at adj.p.val<0.5: 55"
## 
## -------------------------------------------------------------------------------------------------------------
##                    Row.names                       c1        c4      adj.P.Val               V2              
## ----------------------------------------------- --------- --------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212  -0.3919   2.056e-07   GM12875 H3K4me3 Histone Mod 
##                                                                                   ChIP-seq Hotspots 1 from   
##                                                                                           ENCODE/UW          
## 
##     wgEncodeBroadHistoneGm12878H3k4me2StdPk     8.782e-09 -0.02811   4.143e-06  GM12878 H3K4me2 Histone Mods 
##                                                                                    by ChIP-seq Peaks from    
##                                                                                         ENCODE/Broad         
## 
##     wgEncodeBroadHistoneGm12878H3k9acStdPk      3.849e-12  -0.1383   4.143e-06   GM12878 H3K9ac Histone Mods 
##                                                                                    by ChIP-seq Peaks from    
##                                                                                         ENCODE/Broad         
## 
##    wgEncodeBroadHistoneGm12878H3k79me2StdPk     1.867e-08 -0.005656  5.047e-06  GM12878 H3K79me2 Histone Mods
##                                                                                    by ChIP-seq Peaks from    
##                                                                                         ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07  -0.2947   8.699e-06   GM12865 H3K4me3 Histone Mod 
##                                                                                   ChIP-seq Hotspots 2 from   
##                                                                                           ENCODE/UW          
## 
##   wgEncodeBroadHistoneGm12878H3k04me3StdPkV2    4.708e-08 -0.01621   2.43e-05   GM12878 H3K4me3 Histone Mods 
##                                                                                    by ChIP-seq Peaks from    
##                                                                                         ENCODE/Broad         
## 
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202   -0.5261   2.535e-05   GM12864 H3K4me3 Histone Mod 
##                                                                                   ChIP-seq Hotspots 2 from   
##                                                                                           ENCODE/UW          
## 
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06  -0.3212   2.572e-05   GM12865 H3K4me3 Histone Mod 
##                                                                                   ChIP-seq Hotspots 1 from   
##                                                                                           ENCODE/UW          
## 
##       wgEncodeBroadHistoneDnd41H3k09acPk        0.0001527  -0.3194   2.572e-05  Dnd41 H3K9ac Histone Mods by 
##                                                                                      ChIP-seq Peaks from     
##                                                                                         ENCODE/Broad         
## 
##       wgEncodeBroadHistoneDnd41H3k04me1Pk       9.105e-08 -0.06965   3.248e-05  Dnd41 H3K4me1 Histone Mods by
##                                                                                      ChIP-seq Peaks from     
##                                                                                         ENCODE/Broad         
## -------------------------------------------------------------------------------------------------------------
## 
## [1] "c2 vs. c3 , number of degs significant at adj.p.val<0.5: 18"
## 
## -------------------------------------------------------------------------------------------------------
##                 Row.names                     c2       c3      adj.P.Val               V2              
## ------------------------------------------ -------- --------- ----------- -----------------------------
## wgEncodeBroadHistoneA549H3k79me2Dex100nmPk 0.008678 -0.02115    0.01579     A549 DEX 100 nM H3K79me2   
##                                                                             Histone Mods by ChIP-seq   
##                                                                              Peaks from ENCODE/Broad   
## 
##   wgEncodeBroadHistoneHsmmH3k27me3StdPk    -0.02327 3.654e-07   0.02165   HSMM H3K27me3 Histone Mods by
##                                                                                ChIP-seq Peaks from     
##                                                                                   ENCODE/Broad         
## 
##    wgEncodeBroadHistoneNhaH3k27me3StdPk    -0.01506 0.0001926   0.02689   NH-A H3K27me3 Histone Mods by
##                                                                                ChIP-seq Peaks from     
##                                                                                   ENCODE/Broad         
## 
## wgEncodeBroadHistoneA549H3k36me3Dex100nmPk  0.1047  -0.004845   0.02689     A549 DEX 100 nM H3K36me3   
##                                                                             Histone Mods by ChIP-seq   
##                                                                              Peaks from ENCODE/Broad   
## 
##     wgEncodeBroadHistoneNhlfH3k79me2Pk     0.005332 -0.05179    0.03198   NHLF H3K79me2 Histone Mods by
##                                                                                ChIP-seq Peaks from     
##                                                                                   ENCODE/Broad         
## 
##   wgEncodeBroadHistoneK562H3k36me3StdPk    0.001587 -0.009793   0.03505   K562 H3K36me3 Histone Mods by
##                                                                                ChIP-seq Peaks from     
##                                                                                   ENCODE/Broad         
## 
##    wgEncodeBroadHistoneHsmmtH3k09me3Pk     -0.02596 0.003835    0.04333   HSMMtube H3K9me3 Histone Mods
##                                                                              by ChIP-seq Peaks from    
##                                                                                   ENCODE/Broad         
## 
##  wgEncodeBroadHistoneA549H3k27me3Etoh02Pk  -0.01807 0.002645    0.04472     A549 EtOH 0.02% H3K27me3   
##                                                                             Histone Mods by ChIP-seq   
##                                                                              Peaks from ENCODE/Broad   
## 
##    wgEncodeBroadHistoneHsmmtH3k27me3Pk     -0.01922 0.001151    0.04472     HSMMtube H3K27me3 Histone  
##                                                                            Mods by ChIP-seq Peaks from 
##                                                                                   ENCODE/Broad         
## 
##    wgEncodeBroadHistoneNhdfadH4k20me1Pk    0.001681 -0.02737    0.04472   NHDF-Ad H4K20me1 Histone Mods
##                                                                              by ChIP-seq Peaks from    
##                                                                                   ENCODE/Broad         
## -------------------------------------------------------------------------------------------------------
## 
## [1] "c2 vs. c4 , number of degs significant at adj.p.val<0.5: 5"
## 
## -----------------------------------------------------------------------------------------------------
##                Row.names                   c2        c4      adj.P.Val               V2              
## ---------------------------------------- ------- ---------- ----------- -----------------------------
## wgEncodeBroadHistoneNhdfadH3k36me3StdPk  0.1009  -0.005215    0.02027   NHDF-Ad H3K36me3 Histone Mods
##                                                                            by ChIP-seq Peaks from    
##                                                                                 ENCODE/Broad         
## 
##   wgEncodeBroadHistoneNhekH3k9me1StdPk   0.2002  -0.001186    0.02253   NHEK H3K9me1 Histone Mods by 
##                                                                              ChIP-seq Peaks from     
##                                                                                 ENCODE/Broad         
## 
##  wgEncodeBroadHistoneHmecH3k36me3StdPk   0.01773 -0.0002847   0.04273   HMEC H3K36me3 Histone Mods by
##                                                                              ChIP-seq Peaks from     
##                                                                                 ENCODE/Broad         
## 
## wgEncodeBroadHistoneGm12878H3k36me3StdPk 0.2219  -0.001063    0.05838   GM12878 H3K36me3 Histone Mods
##                                                                            by ChIP-seq Peaks from    
##                                                                                 ENCODE/Broad         
## 
##      wgEncodeBroadHistoneK562NcorPk      0.04478 -0.005174    0.0846      K562 NCoR Histone Mods by  
##                                                                              ChIP-seq Peaks from     
##                                                                                 ENCODE/Broad         
## -----------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
## 
## ----------------------------
##  &nbsp;   c1   c2   c3   c4 
## -------- ---- ---- ---- ----
##  **c1**   0    44   54   55 
## 
##  **c2**   0    0    18   5  
## 
##  **c3**   0    0    0    0  
## 
##  **c4**   0    0    0    0  
## ----------------------------

Text mining question 2: Are the terms associated stronger with the diseases in one vs. the other cluster based on the literature strength? Are the terms themselves related based on the literature? Expected answer: Yes, the literature associations should confirm the relationships.

Summary

  1. Again, cluster 1 is strongly distinct. Cluster 2 is less so. Histone marks seem all active.
C1 C2 C3 C4
C1 Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2
C2 Cell types: K562, NHEK, NHDF-Ad, NH-A, HMEC Reg: H3K36me3, H4K20me1, H3K79me2 Nothing significant
C3 Nothing significant
C4

Analysis of all regulatory datasets

Out of all regulatory datasets, we select all. The goal here is to get potentially tighter clustering.

## [1] 4498   39
## [1] 2969   39

We check how regulatory similarity correlates with overlap similarity.

##      x    y
## x 1.00 0.33
## y 0.33 1.00
## 
## n= 1482 
## 
## 
## P
##   x  y 
## x     0
## y  0

Next, we visualize heatmap of regulatory similarity.

The top 10 pairs of disease-associated SNPs are most similar with each other.

## 
## --------------------------------------------------------------------------------------------
##                   Disease 1                            Disease 2          Corr. coefficient 
## ---------------------------------------------- ------------------------- -------------------
##                HDL_cholesterol                       Triglycerides              0.473       
## 
##                LDL_cholesterol                       Triglycerides             0.4314       
## 
##             Chronic_kidney_disease                   Urate_levels              0.3742       
## 
##                HDL_cholesterol                      LDL_cholesterol            0.3475       
## 
##              Bone_mineral_density                   Type_2_diabetes            0.3225       
## 
##               Multiple_sclerosis               Primary_biliary_cirrhosis        0.316       
## 
##              Alzheimers_combined                    Type_2_diabetes            0.2999       
## 
## Liver_enzyme_levels_gamma_glutamyl_transferase       Urate_levels              0.2976       
## 
##         Fasting_glucose_related_traits              Type_2_diabetes            0.2972       
## 
## Liver_enzyme_levels_gamma_glutamyl_transferase      Platelet_counts            0.2944       
## --------------------------------------------------------------------------------------------

The similarity dendrogram can be divided into separate groups:

## Cluster01 has  14 members 
## Platelet_counts
## Liver_enzyme_levels_gamma_glutamyl_transferase
## Red_blood_cell_traits
## LDL_cholesterol
## HDL_cholesterol
## Triglycerides
## Type_2_diabetes
## Fasting_glucose_related_traits
## Bone_mineral_density
## Alzheimers_combined
## Creatinine_levels
## Renal_function_related_traits_BUN
## Urate_levels
## Chronic_kidney_disease
##  
## Cluster02 has   9 members 
## Multiple_sclerosis
## Kawasaki_disease
## Celiac_disease
## Systemic_lupus_erythematosus
## Psoriasis
## Ulcerative_colitis
## Rheumatoid_arthritis
## Crohns_disease
## Autoimmune_thyroiditis
##  
## Cluster03 has   5 members 
## Primary_biliary_cirrhosis
## Ankylosing_spondylitis
## Systemic_sclerosis
## Migraine
## Primary_sclerosing_cholangitis
##  
## Cluster04 has  11 members 
## Juvenile_idiopathic_arthritis
## Atopic_dermatitis
## Alopecia_areata
## C_reactive_protein
## Allergy
## Type_1_diabetes
## Vitiligo
## Behcets_disease
## Progressive_supranuclear_palsy
## Restless_legs_syndrome
## Asthma
## 

The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations bwtween the groups is statistically significantly different.

## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 116"
## 
## ----------------------------------------------------------------------------------------------------------------
##                     Row.names                          c1        c2      adj.P.Val               V2             
## -------------------------------------------------- ---------- --------- ----------- ----------------------------
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1   -0.0696   0.0001995  1.664e-05     GM12878 NFIC v042211.1   
##                                                                                      ChIP-seq Peaks Rep 1 from  
##                                                                                             ENCODE/HAIB         
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1    -0.1765   0.002657   1.664e-05    GM12878 FOXM1 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 1 from  
##                                                                                             ENCODE/HAIB         
## 
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2  -0.2057    0.00165   1.664e-05    GM12878 RUNX3 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 2 from  
##                                                                                             ENCODE/HAIB         
## 
##          wgEncodeOpenChromFaireGm12892Pk            -0.2289   0.001429   2.372e-05    GM12892 FAIRE Peaks from  
##                                                                                        ENCODE/OpenChrom(UNC)    
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2    -0.1244   0.0004603  2.849e-05     GM12878 PML v042211.1    
##                                                                                      ChIP-seq Peaks Rep 2 from  
##                                                                                             ENCODE/HAIB         
## 
##  wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2   -0.04666  6.729e-05  3.617e-05     GM12878 MTA3 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 2 from  
##                                                                                             ENCODE/HAIB         
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2  -0.2152   0.001387   3.617e-05    GM12878 STAT5A v042211.1  
##                                                                                      ChIP-seq Peaks Rep 2 from  
##                                                                                             ENCODE/HAIB         
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1   -0.1962   0.002065   3.617e-05     GM12878 ATF2 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 1 from  
##                                                                                             ENCODE/HAIB         
## 
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep1  -0.1173   0.001776   4.267e-05    GM12878 RUNX3 v042211.1   
##                                                                                      ChIP-seq Peaks Rep 1 from  
##                                                                                             ENCODE/HAIB         
## 
##      wgEncodeBroadHistoneGm12878H3k9me3StdPk       -2.144e-06 7.237e-08  4.636e-05  GM12878 H3K9me3 Histone Mods
##                                                                                        by ChIP-seq Peaks from   
##                                                                                             ENCODE/Broad        
## ----------------------------------------------------------------------------------------------------------------
## 
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 1"
## 
## ---------------------------------------------------------------------------------------------------
##                Row.names                   c1       c3      adj.P.Val               V2             
## ---------------------------------------- ------- --------- ----------- ----------------------------
## wgEncodeBroadHistoneMonocd14ro1746CtcfPk -0.6355 4.879e-06   0.01794   Monocytes CD14+ CTCF Histone
##                                                                        Mods by ChIP-seq Peaks from 
##                                                                                ENCODE/Broad        
## ---------------------------------------------------------------------------------------------------
## 
## [1] "c2 vs. c4 , number of degs significant at adj.p.val<0.5: 76"
## 
## ----------------------------------------------------------------------------------------------------------
##                     Row.names                         c2       c4     adj.P.Val             V2            
## -------------------------------------------------- --------- ------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2  0.00165  0.8762   0.005057    GM12878 RUNX3 v042211.1 
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1  0.0001995 0.8082   0.005057    GM12878 NFIC v042211.1  
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##          wgEncodeOpenChromFaireGm12892Pk           0.001429  0.8088   0.005057   GM12892 FAIRE Peaks from 
##                                                                                    ENCODE/OpenChrom(UNC)  
## 
##  wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1   0.002657  0.8098   0.005057    GM12878 FOXM1 v042211.1 
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2   0.0004603 0.8089   0.005801     GM12878 PML v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.007786  -0.9993   0.00689   GM12878 STAT5A v042211.1 
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2  6.729e-05 -0.9847   0.00689    GM12878 MTA3 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.001387  0.6498    0.00689   GM12878 STAT5A v042211.1 
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1  0.002065  0.7675    0.00689    GM12878 ATF2 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 1 from
##                                                                                         ENCODE/HAIB       
## 
##  wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2  0.007315  0.8856    0.00899    GM12878 ATF2 v042211.1  
##                                                                                  ChIP-seq Peaks Rep 2 from
##                                                                                         ENCODE/HAIB       
## ----------------------------------------------------------------------------------------------------------
## 
## [1] "c3 vs. c4 , number of degs significant at adj.p.val<0.5: 1"
## 
## --------------------------------------------------------------------------------------------------
##                Row.names                    c3       c4    adj.P.Val               V2             
## ---------------------------------------- --------- ------ ----------- ----------------------------
## wgEncodeBroadHistoneMonocd14ro1746CtcfPk 4.879e-06 0.7054   0.08407   Monocytes CD14+ CTCF Histone
##                                                                       Mods by ChIP-seq Peaks from 
##                                                                               ENCODE/Broad        
## --------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
## 
## ----------------------------
##  &nbsp;   c1   c2   c3   c4 
## -------- ---- ---- ---- ----
##  **c1**   0   116   1    0  
## 
##  **c2**   0    0    0    76 
## 
##  **c3**   0    0    0    1  
## 
##  **c4**   0    0    0    0  
## ----------------------------

Summary

The picture is not as good as when we are taking subsets of regulatory datasets.

c1 c2 c3 c4
## Cluster01 has 14 members ## Platelet_counts ## Liver_enzyme_levels_gamma_glutamyl_transferase ## Red_blood_cell_traits ## LDL_cholesterol ## HDL_cholesterol ## Triglycerides ## Type_2_diabetes ## Fasting_glucose_related_traits ## Bone_mineral_density ## Alzheimers_combined ## Creatinine_levels ## Renal_function_related_traits_BUN ## Urate_levels ## Chronic_kidney_disease 416 total up in C2 Cell types: Gm, B cells Factors: NFIC, FOXM1, RUNX3, CEBPB and other TFBSs; DNAse HS | 1 total up in C3 Cell types: Monocytes CD14+ Factors: CTCF | | | ## Multiple_sclerosis ## Kawasaki_disease ## Celiac_disease ## Systemic_lupus_erythematosus ## Psoriasis ## Ulcerative_colitis ## Rheumatoid_arthritis ## Crohns_disease ## Autoimmune_thyroiditis | | | | 96 total up in C2 Cell types: B cells, Gm Factors: RUNX3, NFIC, FOXM1 and other TFBSs, DNAse HS
## Primary_biliary_cirrhosis ## Ankylosing_spondylitis ## Systemic_sclerosis ## Migraine ## Primary_sclerosing_cholangitis 1 total up in C3 Cell types: Monocytes CD14+ Factors: CTCF
## Juvenile_idiopathic_arthritis ## Atopic_dermatitis ## Alopecia_areata ## C_reactive_protein ## Allergy ## Type_1_diabetes ## Vitiligo ## Behcets_disease ## Progressive_supranuclear_palsy ## Restless_legs_syndrome ## Asthma

Distribution of maxMin correlation coefficients

Regulatory- and co-morbidity similarities

To evaluate whether regulatory and co-morbidity measurements correlate, a matrix of disease-disease co-morbidity correlations (AllNet3.txt) is downloaded.

We create square matrixes (14x14) of disease-disease co-morbitity correlations and regulatory correlations.

To evaluate correlation between the two methods of measurements, the matrixes are correlated with each other. A matrix of correlation coefficients, a total number of pairs used for correlation measurement, and a matrix of p-values are outputted.

The ongoing debate is whether to remove or keep self-self associations.

## [1] "Co-occurrence"
##     x   y
## x 1.0 0.3
## y 0.3 1.0
## 
## n= 1521 
## 
## 
## P
##   x  y 
## x     0
## y  0
## [1] "Relative risk"
##      x    y
## x 1.00 0.38
## y 0.38 1.00
## 
## n= 1521 
## 
## 
## P
##   x  y 
## x     0
## y  0
## [1] "Phi-correlation"
##      x    y
## x 1.00 0.41
## y 0.41 1.00
## 
## n= 1521 
## 
## 
## P
##   x  y 
## x     0
## y  0

The regulatory and co-morbidity-based (Phi-correlations) disease-disease correlations correlate with each other at Pearson’s correlation coefficient of 0.54 (when keeping self-correlations, p-value = 0). Using “relative risk” co-morbidity correlations produces similar results.

Iridescent literature similarity

## [1] "sharedRels correlation with episimilarity"
## 
## ---------------------
## &nbsp;    x      y   
## ------- ------ ------
##  **x**    1    0.3701
## 
##  **y**  0.3701   1   
## ---------------------
## 
## [1] "obsExp correlation with episimilarity"
## 
## ---------------------
## &nbsp;    x      y   
## ------- ------ ------
##  **x**    1    0.5449
## 
##  **y**  0.5449   1   
## ---------------------
## 
## [1] "directStr correlation with episimilarity"
## 
## ---------------------
## &nbsp;    x      y   
## ------- ------ ------
##  **x**    1    0.2504
## 
##  **y**  0.2504   1   
## ---------------------
## 
## [1] "relOverlap correlation with episimilarity"
## 
## ---------------------
## &nbsp;    x      y   
## ------- ------ ------
##  **x**    1    0.3825
## 
##  **y**  0.3825   1   
## ---------------------
## 
## [1] "misn correlation with episimilarity"
## 
## -------------------
## &nbsp;    x     y  
## ------- ----- -----
##  **x**    1   0.725
## 
##  **y**  0.725   1  
## -------------------